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ABSTBACT 

This paper develops a stimulus selection theory, 
based on an extensive re^iev of previous research* which gives weight 
to context change or stimulus generalization decrement* The theory 
ar'^uaes no special compounding or conf igurational process, and 
accounts for the learning of successive discriminations without the 
addition of any special process. The theory predicts the relative 
rates of acquiring simultaneous and successive discrimination, 
including the **exceptions," and leads to correct predictions in a 
number of other paradigms. A computer simulation which embodies the 
context-sensitive theory confirms the predictions of the context 
theory of discrimination learning which has direct iaplications for 
research on types of learning process. Coaponent, coapound, and 
conf igurational learning eaerge as summary descriptions of 
performance in different situations, but according to the present 
theory are neither styles nor distinct types of learning since data 
from the various situations are predicted by a single process. 
(Author/HMV) 
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A Context-Sensitive Theory of Discrimination Learning 

Douglas L. Medin 
The Rockefeller University 

I. Stimulus vs. Response Selection 

Consider an educated rat running in a T-maze on a bright- 
ness discrimination task. On each trial the rat runs down to 
the choice point. On some trials the reward is on the left 
and on others, reward is on the right. When the reward is on 
the left, a black stimulus is on the left at the choice point 
and a white stimulus is on the right; when reward is on the 
right, the black stimulus is on the right and the white is on 
the left. We observe an experienced rat make a correct response 
at the choice point on each of a series of trials even though 
the reward varies in position from trial to trial. Clearly 
the rat has "solved" the discrimination. But the task of the 
experimenter has just begun. He is faced with two alternative 
choices in describing the learning and performance of the rat 
in the T-maze and both the theoretical problems and rewards will 
depend on the alternative the experimenter selects. The pros- 
pective theorist can either characterize discrimination learning 
in terms of stimulus selection or response selection. 

Before developing these two points of view it will be useful 
to introduce the two main variants of discrimination learning 
problems, the simultaneous and the successive discrimination 
paradigms, both shown in Figure 1. B and W stand for black and 
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Insert Figure 1 about here 



white. The panel on the left illustrates the example we used 
of the simultaneous discrimination paradigm, where the reward 
is associated with one stimulus value (in this case, black) 
regardless of its position. For the successive paradigm shown 
in the right panel, the stimuli on each trial are identical and 
the position of the rev,ard depends upon which of the two 
stimulus configurations (BB or WW) is presented. Of course 
the simultaneous paradigm could also be described as one in 
which the reward position depends upon which configuration is 
presented. 

One of the earlier mathematical models for discrimination 
learning (Galliksen £ Wolfle, 1938) described this situation in 
terms of responses to configurations. Gulliksen and Wolfle argued 
that it wrs very natural to conceive of the rat as '^oing left" 
and "going right." According to this orientation, these two 
responses come under the control of the appropriate stimulus con- 
figuration. Other mathematical learning models (Bush 8 Hosteller, 
1951; Atkinson, 1960; Bush, 1963; Sternljerg, 1963) have preferred 
a response selection characterization, perhaps because of its 
elegant simplicity and perhaps also because of the strong tra- 
dition of psychologists to speak in terms of reinfor':ing responses 
For either simultaneous or successive discriminations, there are 
t/;o stimulus settings (BW and WB or BB and WW) and two responses 
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and learning can be described succinctly in terms of the 
probability of a left (or right) response given a particular 
stimulus setting. 

While response selection theories seem logical, it may be 
equally plausible to describe a discrimination task in terms of 
stimulus selection. For example, the simultaneous discrimination 
of Figure 1 could be conceived^in terms of learning to choose 
black and avoid white regardless of their position. Spences's 
(1936) theory of discrimination learning is an example cf this 
approach. According to Spence, the cues of Black, White, Left 
and Right all acquire habit strength and the responses are determined 
by the stimulus complex of highest strength. 

These two distinct conceptions of the learning process have 
given rise to considerable experimental controversy which has 
served to point out serious difficulties with either approach. 
Let me briefly summarize some of the key findings. Nissen (1950) 
trained chimpanzees on a black-white simultaneous arrangement 
with the stimuli separated either 'in the usual horizontal orien- 
tation or in a vertical pi me. A given animal was trained with 
a single orientation (e.g., laft, right) and then given transfer 
tests with the stimuli appr.aring in the other orientation (e.g. , 
up, down). The chimpanze-^s showed excellent but not perfect trans- 
fer of the discrimination across orientations. A response selec- 
tion theory would have no basis for predicting above chance trans- 
fer since during training the subject would either learn when to 
l'»ft or r-'ght or when to go up or down and the new stimulus 
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configurations were orthogonal to the old. A stimulus selection 
view would d?rectly predict the transfer since if a chimpanzee 
learned to choose black during training, he should choose black 
during transfer regardless of its orientation. The only dif- 
ficulty might be that a stimulus selection theory might predict 
that transfer should have been perfect, which it clearly was not. 

Babb (19 50) trained rats on a black-white simultaneous cUs- 
crimination and then gave transfer tests involving two black 
(BB) or two white iW) stimuli. He observed that his subjects 
responded quickly for two positive stimuli but slowly if at all 
for two negative stimuli. A stimulus selection theory would pre- 
dict just this effect but a response selection theory would imply 
no differences in reaction to these two situations. 

Ref earing again to Figure 1 s response selection and stimulus 
selection theories would tend to differ in terms of the relative 
rates of acquisition of simultaneous and successive discriminations. 
According to a response selection theory, since BW and WB (simul- 
taneous.) should be less distinctive situations than BB and WW 
(successive), simultaneous discriminations should be more dif- 
ficult than successive discrimination problems. However, if we 
think in terms of stimulus selection, for the successive problem 
Left, Right, Black, and White are rewarded equally often and it's 
difficult to see how a successive discrimination could be solved. 
In lac*, Spence's theory as originally presented would predict 
that the successive problem is insoluable. This awkward fact 
can be averted by assuming (as Spence, 1952 did) that subjects 
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may form compounds of stimuli (e.g., white and left) under some 
circumstances. Thus the successive problem in Figure 1 would 
be solved by learning to approach "black-on-the-left" and "white- 
on-the-right . " Whatever else one thinks of this modification it 
would seem that stimulus selection theories lead to the predic- 
tion that simultaneous discriminations should be easier than 
successive discriminations. Although there are a few exceptions 
which we shall presently consider, the weight of evidence (see 
Sutherland and Mackintosh, 1971) is strong in showing that simul- 
taneous discriminations are mastered more easily than successive 
discriminations . 

The empirical evidence has tended almost uniformly to sup- 
port stimulus selection over response selection models and recent 
discrimination learning models (e.g., Zeaman £ House, L963; 
Lovejoy, 1968; Sutherland and Mackintosh, 1971; Fisher and Zeaman 
1972) all fall under the framework of stimulus selection theories 
However, a certain discomfort remains because the theoretical 
mechanisms (e.g., compounding) evoked to explain successive 
discrimination learning play little or no role in accounts of 
simultaneous discrimination learning. Although this is probably 
not a fatal problem, it would seem incumbent on such theories to 
explain more clearly the role of compounds in learning if the 
notion of compound is to be more than a convenient explanatory 
device. VJheh does compounding come into play? How is it modi- 
fied? Since our theories are to be about processes in organisms, 
'the mechanisms that an animal has available to bring to bear on 
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a situation ought to be the same regardless of the paradigm under 
consideration. To be sure, the situation may modify what the 
experimenter observes but then, we need a theory relating mechan- 
isms to particular situations. 

In the remainder of this paper, a stimulus selection theory 
is developed which gives important weight to context change or 
stimulus generalization decrement. The theory assumes no special 
compounding or conf igurational process and accounts for the 
learning of successive discriminations without the addition of 
any special process. The theory predicts the relative rates of 
acquiring simultaneous and successive discrimination, including 
the "exceptions" and leads to correct predictions in a number of 
other paradigms. To lay the groundwork for the theory, in the 
following section we consider the idea of context change or 
stimulus generalization decrement in some detail. 
II. Context Change 

There seems to be increasing awareness that memory phenomena 
may have implications for theories of learning (e.g., Estes, 1973) 
Forgetting was more or less ignored in early treatment of learning 
perhaps because some early experiments seemed to show so little 
of it. Indeed, if animals are trained in a situation, receive 
little interfering training during the retention interval, and 
are tested in a situation identical to the training situation, 
they demonstrate remarkable retention up to at least 2 years 
(Liddell, 1927). 

But what happens when the training and test situations are 
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not identical? We know from countless stimulus generalization 
studies that performance varies directly with changes in the 
relevant cue or cues. Less plausible but equally clear is the 
finding that changes in seemingly irrelevant or "background" 
features of a situation alter performance. For example, as 
early as 1917, Carr showed that performance on a spatial dis- 
crimination in a maze is lowered by 1) increases and decreases 
in illumination, 2) a change in the experimenters position, 
3) a change of position of the maze in the room, or 4) rotation 
of the maze. 

These may seem like trivial or uninteresting effects but 
there is evidence that contextual variables will be given in- 
creased weight in theorizing. Zentall (197 0) demonstrated that 
retroactive and proactive interference are strongly controlled 
by the similarity of the learning and interfering contexts. In 
the same theme Robbins and Meyer (1971) and Glendenning and Meyer 
(1971) found that retrograde amnesia and retroactive interference 
were more strongly related to similarity of motivational states 
rather than temporal order. 

The idea that contextual variables affect performance is 
not new; indeed, it is so familiar that it has a special term 
"generalization decrement." Nor has this factor been ignored by 
discrimination learning theorists.' Sutherland and Mackintosh(1971 ) 
states "when an animal learns to switch in a given analyzer, it 
learns to switch it in a given situation. The rat that has 
learned to respond in a jumping stand to a black-white differ- 
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ence will not show an increased tendency to control its responses 
by responding to brightness cues in totally different situations 
such as its home cage (p. 55)." Yet context change has not been 
formally represented in any recent discrimination learning models. 
In the theory to be presented, context change is promoted to a 
major status and the implications of this are eyplored. 
III. A Theory of Context Change in Discrimination Learning 
A, Assumptions 

We shall begin by stating some general assumptions about 
context change without committing ourselves to any particular 
learning model. On a general level the presentation is biased 
toward some of the recent concepts of reinforcement as discussed 
by Estes (1969a,b,c). Later on, it will become useful to per- 
form computer simulations, which, of necessity, must be statea 
in terms of an explicit learning model. 

1) Learning is conceived of as e matter of acquiring 
associative information concerning stimuli and outcomes. 
The effect of reward is neither direct nor automatic. 
Instead of talking in terms of strengthening responses, 
discussion will be phrased in terms of information, feed- 
back or learned expectations. 

2) In general we would like to distinguish between a cue 
and the context in which it occurs. Roughly speaking, 
a cue is what the subject responds to while context 
refers to the stimulus situation in which the cue occurs. 
In the learning situations to be considered in this 
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paper one can afford to remain at this loose level of 
definition because most of the results are independent 
of what we call cue and what we call context. 
3) The amount of association information or feedback 
available from a cue in a given context is reduced 
either by changes in the cue under consideration or its 
context. The similarity on a given dimension of one cue 
to another will be represented by a parameter whose 
value is between 0 and 1, wjiith 1 corresponding to 
identity on that dimension and 0 corresponding to total 
dissimilarity. The reduction due to context change 

■ 

on a particular dimension will be represented in like 
manner . 

H) The context or cue change decrements from the various 
dimensions are combined in a multiplicative manner to 
yield a single similarity measure. Thus the difference 
between a white triargle on the left and a black square 
on the right involves a difference in position (p) , 
brightness (b), and form (f). Any information asso- 
ciated with the white triangle would generalize to the 
black square but would be reduced by these differences. 
If R is tiie information from a reward associated with 
the white triangle, then pbfR would be associat?d with 
the black square. The difference in information 
(AI) as a result of such a reward trial would be 
(l-pbf)R. 

ERIC 
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&. In a d5.scrimination learning task involving two alter- 
natives we assume that the greater the difference in 
feedback or expectation associated with the two alter- 
natives on a trial, the faster will be learning to dis- 
criminate between them. This assumption is introduced 
for expository purposes only; we shall see that this 
assumption is almost always true for the particular model 
we examine, although one cannot prove this assumption to 
be generally true independent of a particular model. 
B, Application s to Simultaneous and Successive Discriminations 
To see how we intend to employ our assumption, it will be 
useful to refer to Figure 1. From this figure we will generate 
a matrix of stimulus similarities for first the simultaneous and 
then the successive paradigm. Table 1 is the similarity matrix 
for simultaneous discriminations. The main diagonal is, of 
course, the maximum similarity, identity. Looking across the 

Insert Table 1 about here 

first row. we see that black-left differs from black-right in 
position (p), from white-left in brightness (b), and from white- 
right in both brightness and position (bp). From this matrix, 
we can estimate the net difference in information between two 
alternatives after a choice has been made. We shall use the 
notation Bl, to refer to black on the left, and so on. 

Consider now a problem where the' two stimulus settings 
CBl-Wj^ and W^-nj^) are randomly intermixed. For illustrative 
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Table 1. Similarity matrix for the stimuli. The letters p 
and b are parameters for position and brightness 
similarity, respectively. Bl» Br, Wl» and Wr stand 
for black on the left, black on the right, white on 
the left, and white on the right. 
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purposes we focus on B^-Wj^ differences. On B^-Wj^ trials what- 
ever outcome which occurs will be associated with both stimuli 
but to different degrees. If B^ is chosen, the reward will be 
associated to Bl and to Wj^ but to a lesser degree to Wj^ because 
Wj^ differs from B^ in brightness and position. Behar (1962) 
gave monkeys a series of small trial problems where every few 
trials either the correct or incorrect stimulus was replaced by 
a new correct or new incorrect object. On shift trials, monkeys 
preferred the old incorrect object to the new objects if they 
had not responded to it in the previous problem. The fact that 
it had not been chosen previously indicates that the negative 
object was not highly preferred and therefore it is plausible 
that generalization from the series of correct responses to the 
other object before the change produced the preference for old 
incorrect over new objects. If Wj^ is chosen, nonreward will be 
associated with Wj^ and to a lesser extent, because of the 
brightness and position differences, to B^. The effective 
information gain in either case is 1-bp. 

Now consider what happens to the Bl-Wj^ difference on Wl-Br 
trials. When Br is chosen both B^ and Wr are associated with 
reward, the generalization to Bl being diminished by a position 
difference (p) and the generalization to Wr by a difference in 
brightness (b). Likewise nonreward is associated with Bj^. and Wr 
when is chosen. The effective information gain is p-b in 
either case. We can see that the amount by which Wl-Br training 
facilitates the B^-Wr discrimination increases with position 
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similarity and decreases with brightness similarity. 

Considering a response to both the B^-Wr and the Wl-Br 
settings the information gain (AI) will be 1-bp+p-b which 
factors into: 

Aj = (l+p)(l-b) (1) 
From this we anticipate that simultaenous discrimination will 
be solved more rapidly, the greater the distinctiveness of the 
relevant cue (the smaller b) and the greater the similarity of 
the positional cues (the larger p). Evidence on the first point 
is so common that I will only cite MacCaslin (195U) who showed 
that brightness discrimination in rats was related to brightness 
similarity. 

There is almost no data on the effects of position similarity 
OP a nonspatial discrimination. Spiker and Lubker (1965) found 
that a 0-inch separation to be better than a 10-inch separation 
(edge to edge) of stimuli for a brightness discrimination with 
children in terms of trials to criterion (7.H vs. H.8) but the 
trend was not statistically reliable. Using rats in a jumping 
stand, Elias and Stein (1968) obtained clear position similarity 
effects. The U-choice discrimination used either a 25/8-inch or 
a 45/8-inch center to center stimulus separation. A triangle- 
circle discrimination was mastered much more rapidly for the 
larger separation and a diamond-square discrimination produced 
evidence for learning only in the case of the larger stimulus 
separation. 

The successive problem, shown in the right panel of Figure 1> 
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can also be examined in terms of Table 1. The information dif- 
ference between Bj^ and Br on Bl-Br trials is equal to 1-p, and 
on W^-Wj^ trials the information difference between Bl and Br is 
bp-b. The latter term is negative and implies that Wl-Wr train- 
ing would tend to interfere with the Bl-Br performance, and more 
so, the greater the brightness similarity. (If b were equal 
to 1 then this term would be p-1 and the Aj would equal zero, 
implying an insoluable problem. ) From the B^-Br and the Wj^-Wr 
settings we find that the total information gain will be 1-p+bp-b 
which factors into: 

Al = (l-p)(l-b).^ (2) 

From equaltion 2 one would expect that performance on suc- 
cessive discrimination would decrease with both position and 
brightness similarity. MacCaslin (1954) has shown that brightness 
similarity impairs successive brightness discriminations. 

As for the prediction that position similarity impairs suc- 
cessive discrimination, there is virtually no data. A nonsignif- 
icant trendl in the predicted direction was reported by Spiker and 
Lubker (1955) in a study using children as subjects. I know of 
no relevant animal data. 

One can use equations 1 and 2 to compare simultaneous and 
successive discriminations. Since (1) will be larger than (2) 
except for the case when p is zero (or b is 1), we are led to 
predict that simultaneous discriininations should be easier than 
successive discriminations which generally has been found (e.g., 
Spence, 1952; MacCaslin, 195U; Bitterman, Tyler, 8 Elam, 1955; 
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Warren 6 Barron, 1955; Lipsitt, 1961; Spiker 8. Lubker, 1965; 
Price S Spiker, 1967). 

There is an interesting exception to this general rule: 
when the choice responses are not to the relevant cues directly, 
successive discriminations are found to be easier than simultaneous 
discrimination problems (e.g., Bitterman S Wodinsky, 1953; Wodinsky, 
Varley, 8 Bitterman, 1954; Bitterman, Tyler, 8 Elam, 1955; Lipsitt, 
1961). An example of this situation taken from Bitterman and 
Wodinsky (19 53) is shown in Figure 2. 

Insert Figure 2 about here 

^» ^» mm mm m m» m m» m» m m» m» m mm mm mm mm 9m mm ^m mm mm mm ^m ^m 

Responses are to the grey stimuli rather than to the black 
and white cues. If we apply the logic for information differences 
between Gl and Gr on the top of either panel , we obtain 1-p for 
trials on the top setting and p-1 for trials on the bottom set- 
ting, for a net difference of zero. However, the center black 
and white stimuli differ between setting and introducing c for 
the similarity cf the center contexts the G— Gr difference on the 
top remains 1-p for responses to the top setting and becomes 
c(p-l) for response to the bottom setting. The net difference 
then becomes 

Ai = (l-p)(l-c) . (3) 
From equation 3 we expect that the rate of learning will in- 
crease with position distinctiveness and will increase the greater 
the difference in center contexts between the top and bottom set- 
tings of the discrimination. It seems plausible to me that 
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this difference would be greater for the successive discrimination 
than the simultaneous paradigm from which would follow the re- 
sults that in these circumstances, succeysive discriminations are 
solved more rapidly. 

Before we consider other detailed predictions from various 
paradigms, one should pause and note that the model as presented 
is not treating simultaneous and successive discriminations in 
any different manner. No new mechanisms are brought into play 
for one paradigm and not the other. Although this presentation 
may have built in some descriptive biases it is even the case 
that it has not been necessary to distinguish between cue and 
context. Equation 1 results whether we assume that black and 
white are cues and all else is context, or left and right are 
cues and all else context, or black plus left is one cue, white 
plus left another and so on. Whether thi^: is a virtue or a 
flaw cannot be concluded just yet. Within this flexibility we 
can generate a surprising number of predictions for which 
there is quite good support. 

C. Further predictions for simultaneous and successive 
discriminations 

We now consider procedures 'i/here either irrelevant or redun- 
dant relevant cues are added to simultaneous and successive dis- 
criminations. Figure 3 displays discriminations where both size 
and brightness are relevant cues. We derive a measure of the net 
information gain exactly as before. For the simultaneous problem 

Insert Figure 3 about here 
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in the left panel the difference between the top two stimuli 
from the top setting is 1-pSb, where s represents the size sim- 
ilarity parameter, and the difference between the top two stimuli 
owing to generalization from a trial on the bottom setting is 
p-Sb so that 

Aj = (l+p)(l-Sb). (4) 
By similar means we can find the value for the successive paradig 
in the right panel of Figure 3 to be 

Aj = (l-p)(l-Sb). (5) 

Comparing equations U and 5 to equations 1 and 2, we con- 
clude that the information gain will be greater and the dis- 
crimination will be solved more quickly when redundant relevant 
cues are present. This prediction has been repeatedly confiremd 
for the simultaneous discriminations (see Bower £ Trabasso, for 
a review) and has also been demonstrated for successive dis- 
criminations (Warren, 1964; Lubker, 1969). 

Irrelevant dimensions may be added to these discriminations 
in a number of ways. In Figure U, the irrelevant size cue is 
confounded with spatial position. Deriving the value for net 

Insert Figure U about here 

information gain from the two settings we find 

Ai = (l+Sp)(l-b) (6) 

for the simultaneous case and 

Aj = (l-Sp)(l-b) (7) 

for the successive problem. Again referring to equations 1 and 





CO 
<l> 



CO 





CO 
3 
O 




en 



D. L. Medln 



17. 



one can see that simultaneous discriminations should be impaired 
by thy spatially confounded irrelevant cue but that successive 
should be facilitated by the irrelevant size cue. Both these 
predictions were confirmed in a single experiment with children 
by Price and Spiker (1967). 

By adding two new settings to the basic paradigms one can 
produce irrelevant cues not confounded with spatial position as 
is shown in Figure 5. 

Insert Figure 5 about here 

The procedure for deriving the net information difference is the 
same as before with the difference being derived from all four 
settings. The result for the simultaneous paradigm is 

Aj = (l+p)(l-b)(l+S) (8) 
and for the successive paradigm it is 

Ai = (l-p)(l-b)(l+S). (9) 
Here the proper control comparison is a four-setting problem 
where S=l, from which we can predict that the irrelevant dimen- 
sion not confounded with position will impair performance on both 
simultaneous and successive discriminations. For simultaneous 
problems favorable evidence on this prediction has been obtained 
by Lawrence and Mason (1955), Lubker (1967), and Price and Spiker 
(1957), that latter study showing that the more distinctive the 
irrelevant cue, the greater the impairment. As far as I can 
ascertain this prediction ha£ been assessed but once in the suc- 
cessive paradigm and the result was that children's performance 
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was worse in the presence of irrelevant cues (Tragakis, 1968). 

Another choice point in introducing irrelevant dimensions 
is deciding whether they shall vary within or between dimensions. 
In the left panel of Figure 6 the irrelevant brightness dimen- 
sion varies between settings while it varies within settings in 

Insert Figure 6 about here 

in the right panel. The values for net information gain are 

Aj = 1-Sp+bp-Sb (10) 
for the variable between-problem and 

Aj = 1-Sbp+bp-S (11) 
for the variable-within problem. Comparing equations 10 and 11 
we find that 10 exceeds 11 by S(l-p)(l-b). Since this value is 
positive, variab] e-between problems should be easier than variable 
within discriminations and this difference should increase with 
size similarity. At the extreme case of S=l, the variable- 
between problem becomes a successive discrimination, while the 
variable-within problem becomes insoluable. Spiker and Lubker 
(196U) found that size similarity did hurt performance, that 
variable-between was easier than variable-within, and that this 
difference increased as size similarity increased. 

A more usual manner of comparing variable-within versus 
variable-between irrelevant cues uses four stimulus settings as 
in Figure 7. The information analysis now yields the same 

Insert Figure 7 about here 
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equation for either manipulation 

tj = (l-S)(l+p)(l+b), (12) 
and consequently we would predict no differences in these two 
paradigms. However, in making this comparison we run into a 
serious problem. We cannot assume that a given subject has 
equal preferences for white and black stimuli. Tn other situa- 
tions one can perform a counterbalancing manipulation and hope 
« 

preferences are cancelled out. The fact that the correct 
stimuli are sometimes white and sometimes black does not mean 
that preferences are controlled for. We shall confirm this 
problem in discussing the computer simulations and for the 
moment equivocate by stating that one cannot make a clear-cut 
prediction. Neither are the relevant data so clear. Lubker 
(196H) and Sheep and Gray (1971), obtained some data favoring 
variable-between over variable-within but Evans and Beedle (197 0) 
found no difference for retarded girls and variable-within better 
than variable-between for retarded boys. 

This last prediction aside, our analysis appears to be quite 
promising since a number of predictions which are not intuitively 
obvious have been confirmed, and we have failed to note any fatal 
difficulties in our approach. We turn now to other paradigms and 
controversies for which the model is applicable. 
Further Applications 

Appreciable interest has centered around component versus con- 
figurational learning, which loosely corresponds to a stimulus 
versus response selection approach. Figure 8 shows the design 
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Insert Figure 8 about here 

of an experiment by Birch and Vandenberg (1955) which appeared 
to yield contradictory results. Subjects were trained on the 
discrimination 6l-Wr and Bl-Gr shown at the top of the figure 
which could be solved either by learning that black and white 
were correct or by learning to go right for a bright array and left 
for a dark array. The lower left problems (1 ard 2) were tests 
to see if learning has been configuration. If subjects had 
learned to choose right fo-»^ light stimuli and left for dark 
stimuli, then subjects given transfer condition 1 should perform 
better than subjects given transfer condition 2 at the start of 
transfer. This is what Birch and Vandenberg observed. Likewise 
one might expect that subjects would do better in transfer situa- 
tion U than in 3 if they had learned "light-go right, dark-go left." 
Yet subjects performed better on task 3 than U as if they had 
learned to choose black and white rather than learning a con- 
figuration. 

We can apply the present model to these data by measuring the 
similarity of the training and transfer situations. First taking 
transfer situations 1 and 2 the Wr-Wl difference from the G^-Wj^ 
setting is 1-p and from the 8^-0^ setting is pb-b, yield a new 
difference of (l-p)(l-b) which is positive so that the model pre- 
dicts that situation 1 will have an advantage over situation 2 at 
the start of transfer. Similar empirical results have been re- 
ported by Johnson (1962) and Lubker (196U). 
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Transfer in situations 3 and U is slightly more complicated 
because there is both black-gray and white-gray differences 
which we will assume to be equal and a black-white difference 
which would be expected to be larger than the other differences. 
We represent the black-white differences by b and the black-gray 
and white-gray differences by b+X, where X is greater than zero 
and less than 1-b. That is, gray and black are assumed to be 
more similar than white and black. Now one can obtain that the 
W^-Gr difference is equal to [p(l-b)-X(l-p)] and since X is less 
than 1-b, the difference in initial information must be such that 

Al > (l-b)(2p-l). (13) 
Therefore whenever p is greater than one half we would predict 
better transfer to situation 3 than H. This means that under 
this restriction both of the main results of Birch and Vandenberg 
follow from the model. 

A related more frequently used paradigm is shown in Figure 9 . 

Insert Figure 9 about here 

R, Y, B, and W stand for Red, Yellow, Black, and White but there 
is no special significance to these particular colors. After 
training on the top problem subjects are given transfer to both 
stimulus arrangements. Experimenters then measure whether sub- 
jects continue tc select the same rewarded stimuli (Red and White) 
or whether they select the same responses to the stimulus con- 
figurations. To obtain transfer predictions we use C for color 
similarity and use C,^ and Cb to distinguish within and between 
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Transfer 



D. L. Medln 



22. 



setting color similarity since that has been an exptrimental 
variable. At the beginning of transfer the difference in reward 
information between Rr and Y^, would be 

Ai = p-Cw+Cij-pCb. (lU) 

From equation lU we predict that a stimulus selection 
solution (choice of red) will 1) decrease with within setting color 
similarity, 2) increase with between pair similarity, and 3) 
increase with position similarity. I know of no evidence on the 
second and third predictions but favorable results on the first 
prediction have been reported by Turbeville, Calvin, and Bitterman, 
(1952); White and Spiker (1960); Teas and Bitterman (1962); and 
Zeiler and Paul (1955). 

Note that in terms of the model, compound and conf igurational 
solution modes arc more properly thought of as properties of 
stimuli than of subjects as such. If the within pair similarity 
varies in the two settings, then it is quite possible that one 
might obtain "conf ugurational responding" for one pair and a 
"component solution" for the other setting pair. This outcome has 
been observed by Liu and Zeiler (1968) and Campione, McGrath, 
and Rabinowitz (1971). Developmental shifts in patterns of 
responding (e.g., Zeiler, 196i4) might simply reflect shifts in 
the salience of particular dimensions, such as a decrease in the 
salience of position cues. 

Quite complicated variations of simultaneous and successive 
discriminations have been shown to be' soluable. Figures 10 and 
and 11 show what might be called conditional simultaneous and 
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Insert Figures 10 and 11 about here 



conditional successive discriminations. The conditional simul 
taneous problem can be described as large is correct for white 
stimuli, small is correct for black stimuli. The information 
value equation for this paradigm is 



Thus there is a net gain of information on this problem which 
Lashley (1958) long ago showed was soluable. It remains for 
the computer simulation to show that we can predict that this 
problem is soluable. Hoyt (1960, 1962) found that brightness 
distinctively facilitates performance on this problem as 
equation 15 implies. 

The conditional successive problem in Figure 11 can be 
solved as for small stimuli; black-go left, white-go right 
while for large stimuli, black-go right and white-go left. The 
appropriate equation is 



Flagg (1973) ran monkeys on the paradigm shown in Figure 11 and 
found that the problem was difficult but soluable. We demon- 
strate that this can be predicted from the model in the computer 
simulation. 

More taxing of the theory is the transverse patterning 
problem where stimulus A is correct in one pair and incorrect 
in another, B is correct in one and incorrect in another, and C 
is correct in one and incorrect in another - i.e., A*B~, B^C** . C'''A'" 



Aj = (l+p)(l-S)(l-b). 



(15) 



Aj = (l-p)(l-S)(l-b). 



(16) 
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These problems are soluable by chimpanzees (e.g., Nissen, 19U2) 
but would not follow from the development of the theory up to 
here. However, these data may be very much in the spirit of 
a context-sensitive theory. To predict that the transverse pattern- 
ing could be mastered if A in the presence of B was a slightly dif- 
ferent context from A in the presence of C. This could account 
both for the solution and the difficulty of transverse pattern 
problems . 

Before summing up, we turn to a comput?rr simulation which 
embodies the context-sensitive theory. We do this to get away 
from simply talking of information which is a useful dev:ce for 
drawing out predictions but too imprecise to have much rigor. 
IV. Computer Simulation of the Model 

Since the feedback or scanning model has' been discussed 
elsewhere (Estes, I3b2, 1956), only modifications will be dis- 
cussed here. The model assumes that on a given trial, the sub- 
ject scans the available choices generating a feedback (covert 
prediction of reward value) for each choice and makes the response 
which he predicts will yield the highest feedback. If the sub- 
ject responds to choice i on Trial n, and is rewarded, feedback 
k for choice i changes with the following linear operation: 

If the subject responds to cue i on Trial n and is not rewarded, 
the linear operator applies as follows: 

Fi,nn = Fi,n(i-e). (16) 

The feedback for all choice alternatives changes on each trial 
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as a function of their similarity to the alternative chosen. 
If Sij is to total similarity (obtained by multiplying stimulus 
similarity parameters from each of the dimensions) of choice j 
to choice i, then if choice i is selected feedback for choice i 
changes by 9 and feedback for choice j changes by SfjO. This 
applies equally for within and between setting cues. 

Feedback values for choices are combined to yield overt 
responses in a manner consistent with the discussion above and as 
treated by Estes (1962), Since we shall only be looking at two- 
choice situations, only its equation shall be given: 

Fad-Fb) 

Pa = Fad-J'bi+Fbd-Fa) » <17) 

where is the probability of selecting choice A over choice B 
given feedback values Fa and Ffc. Note that when feedback for a 
choice reaches unity it will always be selected independent 
of the feedback for the other choice (unless its value is also 
unity) . 

To avoid unnecessary redundancy, the results from simula- 
tions will only be briefly presented. The format will be a 
summary statement, the parameters used (b,p,and S from before 
plus a learning rate parameter ,0 ) , the results, and a reference 
equation and figure. For some novel findings a little more dis- 
cussion will be provided. The initial feedback values for each 
of the choices was set at .50. In each case 50 statistical 
subjects were run for 20 trials on each setting or until learn- 
ing was complete. 
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Statement 


Parameters 


Results 
[Mean Errors] 


Equation 


Figure 


Simultaneous: 
1. Position 
similarity 
helps 


e=.50 

b=.20,p=.30 
b= .20,p= .70 


2.06 
l.HB 


1 


Kleft) 


2. Brightness 


e=.50 








similarity 


b=.00,p=.30 


1.U6 


1 


l(left) 


hurts 


b=.20,p=.30 
b= .50,p= .30 


2.06 

7.m 






3 . Redundant 


e=.5o 








relevant cues 


b=.20,p=.30,S=l 


.00 2.06 




3 (left) 


helps 


b=.20,p=.30,S=. 


30 1.66 







Successive : 

1. Position e=.50 

similarity b=.20,p=.30 5.52 2 l(right) 

hurts b=.20,p=.5 0 17.30 



2. Brightness 

similarity 

hurts 


e = .50 

b=.00,p=.30 
b=.20,p=.30 
b=.50,p=.30 


2.72 
5.52 
38.26 


2 


1 (right) 


3 . Redundant 


0=.5O 








relevant 


b=.20,p=.30,S=1.00 


5.52 


5 


3 (right) 


cue helps 


b=.2a,p=.30,S=.30 


3>6U 
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Statement 



Parameters 



Simultaneous vs. Successive 

1. Simultaneous 6 =.50 

easier than succes- p=.30 

sive and more so as b=.00 

the distinctiveness b=.20 

of the relevant cue b=.50 
lessens as found by 
MacCaslin (195U) 



Results 

[Mean Errors] Equation Figure 



succ. /sim. =1.86 
succ./sim.=2 .68 
succ. /sim.=5 .36 



1»2 



2. Irrelevant cues 
cjnfounded with posi- 
tion hurts simultane- 
ous and helps suc- 
cessive: 



Simultaneous 

Successive 

3. Irrelevant cues 
not confounded with 
position hurts both 
simultaneous and 
successive : 

Simultaneous* : 

Successive- 



8=*.5 0 

b= .20,p = 
b= .20 ,p= 
b= .20,p= 
b= .20 ,p = 



.30,8=1.00 
.30,8=. 30 
.30,8=1.00 
.30,8=. 30 



e=.50 

b= .20,p-.30,S=1.00 
•b=.20,p=.30,S=.30 
b=.20,p=.30,S=1.00 
b= .20,p=.30,S=.30 



2.06 
2.90 
5.52 
3.U0 



1.96 
3.66 

11.68 



8 



U(left) 



U (righT 



5(left: 



S (right 
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Results 

Statement Parameters [Kean Errors] Equation Figur 

Two-setting Variable-Between and Variable-Within Irrelevant Cues 
1. Similarity hurts es.25 

variable-between^b=.U0,p=.10,S=.10 U.12 10 S91ef-: 

\b=.H0,p=.10,S=.40 U.UO 

and variable- 

within b=.10,p=.U0,S=.10 U.58 11 6(rigr. 



b=.10,p=.U0,S=.U0 8.12 
2. Variable between 
easier and more so 

same as 

as similarity in- 
above 

creases Within 

S=.10 Between 1.12 10 

Within 



S=.UO Between 1.85 11 

Four-setting Variable-Between versus Variable-Within Irrelevant Cues 

1, No difference 
predicted if init- 
ial preferences 
(feedback values) 

are equal e=.50 

Variable- 
Between b=.20,p=.20,S=.20 U.22 12 7(left 

Variable- 
Within b=.20,p=.20,S=.20 U.IO 12 7(righ 

2. Initial prefer- 
ences favor var- 
iable between 6=.50 

Variable- 
Between b=.20,p=.20,S=.20 U.06 12 7(left 

Variable- 
Within b^.20,p=.20,S=.20 U.UU 12 7(righ 



cnir Ti^O = .70 for all black cues 

FijO = .30 for all white cues 
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Statement 



Parameters 



Results 

[Mean Errors] Equation Figur 



Conditional Simultaneous and Conditional Successive 
1. Conditional 
Simultaneous will 
be easier than 
conditional Sue- 

e=.50 

cessive 

Simultaneous b= . 20 ,p= . 20 ,S= . 20 
Successive b= . 20 ,p= . 20 ,S= . 20 



2. Both are sol- 



uable 



Simultaneous 
Successive 



see abce 

e=.80 
b=.0O,p=.00,S=.00 



9.6 15 

25.36(in 16 
80 trials and 
still not solved) 



2.^8 



16 



10 
11 



11 



In summary, the computer simulation confirms the predictions 
of the context theory of discrimination learning. The information 
equations are useful for expository purposes and to obtain a short- 
cut view of various paradigms but the computer simulation is really 
the proof of the pudding. The simulation is quite general and can 
be used to investigate a number of alternative paradigms and con- 
ditions, including probabilistic reinforcement tasks and n-alter- 
native discriminations for which the utility of the information 
equations is less clear. Space prevents an exploration of some of 
these other results, although I hope to report on them in the near 
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future. Overall no major shortcomings in the model have been 
evidenced and it is appropriate to turn to a discussion of 
alternative theories and tii> . nlications of the present theory. 
IV Discussion 

The introduction has served t':^ show that most previous 
discrimination learning models have implied different processes 
operating in simultaneous and successive problem learning. Other 
theorists have discussed generalization decrement but have not 
incorporated this concept into models. The stimulus-interaction 
hypothesis of Spiker (1963, 197 0) which is formulated from a 
Hull-Spence orientation comes closest to the present theory. 
Spiker assumes that the habit strength accruing to a stimulus 
component from direct reinforcement of a compound will be reduced 
when that component appears in a different compound and that the 
amount of reduction will increase with the average dissimilarity 
between the corresponding elements in the two stimulus compounds. 
Aside from being formulated from a Hull-Spence point of view, 
Spiker 's theory differs from the present theory in that Spiker 
uses an average dissimilarity measure rather than a product rule 
for combining dissimilarities. The averaging rule leads to two 
clear incorrect predictions: 1) that adding irrelevant r.onspatial 
dimensions to successive discriminations will have no effect; 
and 2) that a conditional successive discrimination is insoluable. 
The product measure of similarity contains neither defect and 
is probably simpler to work . ^ . 

The context-sensitive theory has direct implications for 
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research on "types" of learning process. Component, compound, 
and conf igurational learning emerge as summary descriptions of 
performance in different situations but according to the present 
theory are neither styles nor distinct types of learning since data 
from the various situations are predicted by a single process. 
It may be important to revise our thinking about compounds, com- 
ponents, and configurations at the very least from the point of 
view of what might be evidence for one or another process. 

Although the present model has had wide-ranging success in 
predicting experimental results, it is general enough and simple 
enough that its ideas might be incorporated into current dis- 
crimination models. I have only made the barest beginnings on 
this task but my guess is that some awkward assumptions may be 
dropped and the range of applicability of these theories will be 
broadened. Since the distinction between cue and context is 
not tightly drawn in the present framework (at least for these 
situations) there is room for a considerable range in assumptions 
concerning selectivity in learning and performance (i.e., de- 
fining the functional cues). 



